Neural Machine Translation of Rare Words with Subword Units

机译：用字词单位神经机器翻译稀有词

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Neural machine translation (NMT) models typically operate with a fixed vocabulary, so the translation of rare and unknown words is an open problem. Previous work addresses this problem through back-off dictionaries. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character n-gram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English→German and English→Russian by 1.1 and 1.3 BLEU, respectively.

机译：神经机器翻译（NMT）模型通常以固定的词汇量运行，因此稀有单词和未知单词的翻译是一个未解决的问题。先前的工作通过备用字典解决了这个问题。在本文中，我们基于各种词类可通过比词小的单位进行翻译的直觉，使NMT模型能够通过将稀有词和未知词编码为子词单元序列，从而使NMT模型能够进行词汇翻译例如名称（通过字符复制或音译），复合词（通过构成翻译）以及同源词和借词（通过语音和词法转换）。我们讨论了不同的分词技术的适用性，包括简单字符n-gram模型和基于字节对编码压缩算法的分词，并通过经验证明子词模型在WMT 15翻译任务的基础上比后退字典基线有所改进。 →德语和英语→俄语分别为1.1和1.3 BLEU。

著录项

作者
Sennrich, Rico; Haddow, Barry; Birch, Alexandra;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Finding Better Subwords for Tibetan Neural Machine Translation [J] . Li Yachao, Jiang Jing, Jia Yangji, ACM transactions on Asian and low-resource language information processing . 2021,第2期

机译：为西藏神经机翻译找到更好的次字
2. A Hierarchical Clustering Approach to Fuzzy Semantic Representation of Rare Words in Neural Machine Translation [J] . Yang Muyun, Liu Shujie, Chen Kehai, IEEE Transactions on Fuzzy Systems . 2020,第5期

机译：神经机翻译中稀有词模糊语义表示的分层聚类方法
3. Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation [J] . Gronroos Stig-Arne, Virpioja Sami, Kurimo Mikko Machine translation . 2020,第4期

机译：转移学习和子字抽样对非对称资源一对多神经翻译
4. Neural Machine Translation of Rare Words with Subword Units [C] . Rico Sennrich, Barry Haddow, Alexandra Birch Annual meeting of the Association for Computational Linguistics . 2016

机译：具有子词单位的稀有词的神经机器翻译
5. Evolving neural net circuit modules to detect characters of the alphabet and sequences of characters (words) using the cellular automata module-brain machine. [D] . DeCesare, Derek. 2001

机译：不断发展的神经网络电路模块，使用元胞自动机模块-大脑机器来检测字母字符和字符序列（单词）。
6. An ensemble of neural models for nested adverse drug events and medication extraction with subwords [O] . Meizhi Ju, Nhung T H Nguyen, Makoto Miwa, 2020

机译：神经模型的集成用于嵌套不良药物事件和带有子词的药物提取
7. Neural Machine Translation of Rare Words with Subword Units [O] . Sennrich, Rico, Haddow, Barry, Birch, Alexandra 2016

机译：用字词单位神经机器翻译稀有词
8. Modeling words with subword units in an articulatorily constrained speech recognition algorithm [R] . Hogden, J. 1997

机译：在语音约束语音识别算法中用子词单元建模单词

Neural Machine Translation of Rare Words with Subword Units

摘要

著录项

相似文献

相关主题

期刊订阅